Language Adaptation for Extending Post-Editing Estimates for Closely Related Languages
نویسندگان
چکیده
This paper presents an open-source toolkit for predicting human post-editing efforts for closely related languages. At the moment, training resources for the Quality Estimation task are available for very few language directions and domains. Available resources can be expanded on the assumption that MT errors and the amount of post-editing required to correct them are comparable across related languages, even if the feature frequencies differ. In this paper we report a toolkit for achieving language adaptation, which is based on learning new feature representation using transfer learning methods. In particular, we report performance of a method based on Self-Taught Learning which adapts the English-Spanish pair to produce Quality Estimation models for translation from English into Portuguese, Italian and other Romance languages using the publicly available Autodesk dataset.
منابع مشابه
Identity and Representation through Language in Ghana: The Postcolonial Self and the Other
Research related to colonialism and post colonialism shows how the identities of indigenous people were constructed and how these identities are reconstructed in our contemporary world. The thrust of this paper is that colonialism brought a shift in the linguistic structure of Ghana with the introduction of the use of English among Ghanaians. The coexistence of both Ghanaian languages and Engli...
متن کاملDepfix, a Tool for Automatic Rule-based Post-editing of SMT
We present Depfix, an open-source system for automatic post-editing of phrase-based machine translation outputs. Depfix employs a range of natural language processing tools to obtain analyses of the input sentences, and uses a set of rules to correct common or serious errors in machine translation outputs. Depfix is currently implemented only for English-to-Czech translation direction, but exte...
متن کاملEnglish Language Teaching Material Development
The goal of language programs is to utilize language for effective communication. Due to the needs, interests, and motivations of language learners, they may show individual differences in their lan- guage learning. Materials used in language programs can be instructional, experiential, elucidative, or exploratory in that they can inform learners about the language, provide experience of the la...
متن کاملConstraint-Based Bilingual Lexicon Induction for Closely Related Languages
The lack or absence of parallel and comparable corpora makes bilingual lexicon extraction becomes a difficult task for low-resource languages. Pivot language and cognate recognition approach have been proven useful to induce bilingual lexicons for such languages. We analyze the features of closely related languages and define a semantic constraint assumption. Based on the assumption, we propose...
متن کاملPredicting Machine Translation Adequacy
As Machine Translation (MT) becomes more popular among end-users, an increasingly relevant issue is that of estimating the quality of automatic translations for a particular task. The main application for such quality estimates has been selecting good enough translations for human post-editing. The endusers, in this case, are fluent speakers of both source and target languages and the quality e...
متن کامل